10 research outputs found

    Achieving Synergy in Cognitive Behavior of Humanoids via Deep Learning of Dynamic Visuo-Motor-Attentional Coordination

    Full text link
    The current study examines how adequate coordination among different cognitive processes including visual recognition, attention switching, action preparation and generation can be developed via learning of robots by introducing a novel model, the Visuo-Motor Deep Dynamic Neural Network (VMDNN). The proposed model is built on coupling of a dynamic vision network, a motor generation network, and a higher level network allocated on top of these two. The simulation experiments using the iCub simulator were conducted for cognitive tasks including visual object manipulation responding to human gestures. The results showed that synergetic coordination can be developed via iterative learning through the whole network when spatio-temporal hierarchy and temporal one can be self-organized in the visual pathway and in the motor pathway, respectively, such that the higher level can manipulate them with abstraction.Comment: submitted to 2015 IEEE-RAS International Conference on Humanoid Robot

    Agreement Study Using Gesture Description Analysis

    Get PDF
    Choosing adequate gestures for touchless interfaces is a challenging task that has a direct impact on human-computer interaction. Such gestures are commonly determined by the designer, ad-hoc, rule-based or agreement-based methods. Previous approaches to assess agreement grouped the gestures into equivalence classes and ignored the integral properties that are shared between them. In this work, we propose a generalized framework that inherently incorporates the gesture descriptors into the agreement analysis (GDA). In contrast to previous approaches, we represent gestures using binary description vectors and allow them to be partially similar. In this context, we introduce a new metric referred to as Soft Agreement Rate (SAR) to measure the level of agreement and provide a mathematical justification for this metric. Further, we performed computational experiments to study the behavior of SAR and demonstrate that existing agreement metrics are a special case of our approach. Our method was evaluated and tested through a guessability study conducted with a group of neurosurgeons. Nevertheless, our formulation can be applied to any other user-elicitation study. Results show that the level of agreement obtained by SAR is 2.64 times higher than the previous metrics. Finally, we show that our approach complements the existing agreement techniques by generating an artificial lexicon based on the most agreed properties

    Fazt: Few and Zero-Shot Framework to Learn Tempo-Visual Events from Little or no Data

    No full text
    Supervised classification methods based on deep learning have achieved great success in many domains and tasks that are previously unimaginable. Such approaches build on learning paradigms that require hundreds of examples in order to learn to classify objects or events. Thus, their immediate application to the domains with few or no observations is limited. This is because of the lack of ability to rapidly generalize to new categories from a few examples or from high-level descriptions of categories. This can be attributed to the significant gap between the way machines represent knowledge and the way humans represent categories in their minds and learn to recognize them. In this context, this research represents categories as semantic trees in a high-level attribute space and proposes an approach to utilize these representations to conduct N-Shot, Few-Shot, One-Shot, and Zero-Shot Learning (ZSL). This work refers to this paradigm as the problem of general classification (GCP) and proposes a unified framework for GCP referred to as the Few and Zero-Shot Technique (FAZT). FAZT framework is an end-to-endapproach that uses trainable 3D convolutional neural networks and recurrent neural networks to simultaneously optimize for both the semantic and the classification tasks. Lastly, the problem of systematically obtaining semantic attributes by utilizing domain-specific ontologies is presented. The proposed framework is validated in the domains of hand gesture and action/activity recognition, however, this research can be applied to other domains such as video understanding, the study of human behavior, emotion recognition, etc. First, an attribute-based dataset for gestures is developed in a systematic manner by relying on literature in gestures and semantics, and crowdsourced platforms such as Amazon Mechanical Turk. To the best of our knowledge, this is the first ZSL dataset for hand gestures (ZSGL dataset). Next, our framework is evaluated in two experimental conditions: 1. Within-category (to test the attribute recognition power) and 2.Across-category (to test the ability to recognize an unknown category). In addition, we conducted experiments in zero-shot, one-shot, few-shot and continuous learning conditions in both open-set and closed-setscenarios. Results showed that our framework performs favorably on the ZSGL, Kinetics, UIUC Action, UCF101 and HMDB51 action datasets in all the experimental conditions

    Consensus measured by the Metric I (<i>State of the art</i>) and Metric II (<i>The Jaccard distance using semantic descriptors</i>).

    No full text
    <p>Consensus measured by the Metric I (<i>State of the art</i>) and Metric II (<i>The Jaccard distance using semantic descriptors</i>).</p

    This form contains a list of 34 commands.

    No full text
    <p>Each command is highlighted in gray. The rectangle at the left of the command corresponds to the context of the gesture and the 2-4 rectangles to the right correspond to the modifiers.</p
    corecore